Method and apparatus for memory chip row hammer threat backpressure signal and host side response

ABSTRACT

A memory chip is described. The memory chip includes row hammer threat detection circuitry. The memory chip includes an output. The memory chip includes backpressure signal generation circuitry coupled between the row hammer detection circuitry and the output. The backpressure signal generation signal is to generate a backpressure signal to be sent from the output in response to detection by the row hammer threat detection circuitry of a row hammer threat.

RELATED CASES

This application claims the benefit of U.S. Provisional Application No.63/183,509, entitled, “Method and Apparatus For Row Hammer Recovery”,filed May 3, 2021, which is incorporated by reference in its entirety.

FIELD OF THE INVENTION

The field of invention pertains to the computing sciences generally,and, more specifically, to a method and apparatus for a memory chip rowhammer threat backpressure signal and host side response.

BACKGROUND

A Dynamic Random Access Memory (DRAM) cell stores charge in a capacitivecell. During a standby mode (when there is no access to the cell),charge can continually leak from a cell to the point where its storedvalue is changed (e.g., from a 1 to a 0).

In order to prevent such data loss, a DRAM memory chip is designed torefresh its storage cells. The refresh activity typically entailsreading from a cell to detect its stored value and then writing the samevalue back into the cell. The write operation replenishes the cell witha fresh amount of charge for the particular stored value.

In order to guarantee the integrity of its data over an extended runtime, a memory chip will periodically refresh its storage cells.Specifically, each cell in the memory chip's cell array will berefreshed with sufficient frequency to prevent the loss of its storeddata even if the cell is infrequently accessed.

A recently published Joint Electron Device Engineering Council (JEDEC)standard, dual data rate 5 (“DDR5”), defines cooperative refreshingbehavior between a memory chip and the host (memory controller).Specifically, a memory chip manufacturer defines (in mode register (MR)space of a memory chip) certain timing requirements related to therefreshing of the cells in the memory chip's cell array.

The memory controller reads the timing requirements and schedulesREFRESH commands according to a schedule that is consistent with thetiming requirements. The memory controller then issues REFRESH commandsto the memory chip consistent with the schedule. In response to eachREFRESH command, the memory chip refreshes cells at a granularityspecified by the type of REFRESH command it receives (all banks in aparticular bank group, or the same bank in all bank groups).

DRAM memory cells can also suffer from a data corruption mechanismreferred to as “row hammer”. In the case of row hammer, data can becorrupted in cells that are coupled to rows that are near (e.g., nextto) a row that is frequently activated. As such, memory systems ideallyinclude counters that monitor row activations. If a row is deemed tohave received a number of activations over a time window that exceed athreshold, the cells that are coupled to the nearby rows arepro-actively refreshed to protect them against the row hammer effect.

The JEDEC DDRS standard includes a row hammer mitigation approachreferred to as “refresh management”. In the case of refresh management,the memory controller counts row activations per bank. If the count fora bank exceeds a threshold specified by the memory chip manufacturer,the memory controller issues refresh management (RFM) commands to thememory chip.

In response to each RFM command, the memory chip refreshes cells at agranularity specified by the type of RFM command it receives (all banksin a particular bank group, or same bank in all bank groups). Notably,refreshes performed in response to RFM commands are additional refreshesbeyond the normal scheduled refreshes that are implemented with REFRESHcommands as described above.

The DDR5 standard also provides a mechanism for a memory chip tocommunicate the existence of a problem back to the memory controller. Atypical DDR5 implementation, as depicted in FIG. 1, includes a pair ofsub-channels 101_1, 101_2 coupled between a memory controller 102 and amemory module 103. For simplicity, the constituent memory modulecomponents for only one of the sub-channels is depicted. As observed inFIG. 1, the constituent components for a sub-channel include first andsecond ranks of memory chips 104_1, 104_2 that send/receive data to/fromthe memory controller 102 and a register clock driver (RCD) chip 105that receives command and address (C/A) signals from the memorycontroller.

Each memory chip 106 includes an Alert_n output 107 that is used tosignal a write cyclic redundancy check (CRC) error has occurred (“DQCRC” write error). Here, DDR5 data transfers are performed in burststhat consume multiple cycles. Actual data is transferred during theearlier cycles and CRC information is transferred during the latercycles. Per write burst, each memory chip in the targeted rankinternally calculates its own CRC information from the data it receivesand compares it to the received CRC information. If there is a mismatchthe memory chip asserts a flag on its Alert_n output 107 (for ease ofdrawing, FIG. 1 only labels one memory chip 106 and one Alert_n output107).

The Alert_n output from each of the memory chips within a same rank aretied together and routed to the RCD chip 105. If any memory chip detectsthat a DQ CRC write error has occurred, according to the DDR5 standard,within 3 to 13 ns of the event, the memory chip will generate a pulsehaving a width within a range of 12 to 20 clock cycles (nCK). The RCDchip 105 receives the pulse and re-drives it on the sub-channel'sALERT_N wire 108 to inform the memory controller of the event.

Additionally, the RCD chip 105 is designed to detect parity errors withrespect to the sub-channel's received C/A signals. If the RCD chip 105detects a parity error within the C/A signals, the RCD chip 105generates a pulse on the sub-channel's ALERT_N wire 108 having a widthof 60-120 ns to inform the memory controller of the event.

FIG. 2 depicts the time windows for both the DQ CRC write error and CAparity error pulses. As observed in FIG. 2, both pulses are logic lowassertions in that they both start with a falling edge and end with arising edge. Notably, the minimum pulse width of the CA parity error islonger than the maximum pulse width of the DQ CRC write error so thatthe memory controller can distinguish which type of error has occurred.

FIGURES

A better understanding of the present invention can be obtained from thefollowing detailed description in conjunction with the followingdrawings, in which:

FIG. 1 (prior art) depicts a memory controller coupled to a memorymodule;

FIG. 2 (prior art) depicts CRC write error and C/A parity error signals;

FIG. 3 shows an improved memory chip;

FIG. 4 shows a backpressure signal;

FIG. 5 shows an improved memory controller;

FIG. 6 shows a system;

FIG. 7 shows a data center;

FIG. 8 shows an environment.

DETAILED DESCRIPTION

Future generation memory chips are expected to be designed to includetheir own row hammer threat detection circuitry. For example, at leastsome future generation memory chips are expected to include additionalDRAM cells per row that are used to hold that row's activation count. Ifthe row activation count for any particular row reaches a threshold, thememory chip recognizes the existence of a “nominal” row hammer threat.

An extended row hammer threat can arise if the respective activationcounts for a number of different rows in a same memory chip each reachtheir respective thresholds at approximately the same time (“row hammeroverload”). In this case, the memory chip is expected to consumeconsiderable time refreshing the collection of rows that are threatened(one set of nearby rows per row whose threshold has been reached).

Regardless as to which type of row hammer threat is detected by thememory chip (a “nominal” row hammer threat or “row hammer overload”threat, collectively referred to hereafter as a “row hammer (RH)threat”, “row hammer (RH) event” or the like), the memory chip shouldinform the memory controller that it has detected a row hammer threat.

Additionally, particularly in cases where the row hammer threat is notexpected to be mitigated until large numbers of refreshes are performed(e.g., row hammer overload), the memory chip should send some kind ofrow hammer related back pressure signal to the memory controller thatinforms the memory controller that it is temporarily inadvisable to sendany further read, write or activate commands to the memory chip.

An improved memory chip 301 is therefore depicted in FIG. 3 thatincludes first circuitry 301 to internally detect an RH threat andsecond circuitry 302 to construct and send an RH back pressure signalfrom its Alert_n output 304 in response to its internal detection of anRH threat. According to various memory module implementations,consistent with the discussion of FIG. 1 above, the RH back pressuresignal is received by an RCD chip and re-driven onto the ALERT_N wire ofa sub-channel.

In various embodiments, the structure of the RH back pressure signal isdifferent than both the DQ CRC write error and the C/A parity error sothat it can be differentiated from these signals by the memorycontroller.

Specifically, as observed in FIG. 4, the minimum pulse width for the RHback pressure signal is greater than the maximum pulse width for the CAparity error signal. Here, once a pulse is asserted on the ALERT_N wirewith a high to low transition, if the subsequent low to high transitionthat marks the end of the pulse occurs within the following 60 ns, thememory controller understands that a DQ CRC write error signal is beingsent. By contrast, if the subsequent low to high transition occurs after60 ns but within 120 ns the memory controller understands that a CAparity error signal is being sent. Finally, if instead the subsequentlow to high transition 401 occurs after 120 ns the memory controllerunderstands that an RH backpressure signal is being sent.

In various embodiments, as depicted in FIG. 4, the minimum pulse widthof the backpressure signal is defined to be 150 ns so that there existsa 30 ns pulse width difference between the CA parity error signal andthe backpressure signal. So doing allows for easy differentiation by thememory controller as to whether a CA parity error signal is being sentor if an RH backpressure signal is being sent (if the low to hightransition occurs after 60 ns but within 120 ns the memory controllerunderstands that a CA parity error signal is being sent, or, if the lowto high transition 401 occurs at or after 150 ns the memory controllerunderstands that a backpressure signal is being sent).

In various embodiments, the minimum and/or maximum pulse widthdefinitions for the backpressure signal is/are defined in the memorychip's MR space 305 (e.g., along with pulse width definitions for the DQCRC write and/or CA parity bit errors).

In further embodiments, the MR space 305 also includes a “partialcommand block” parameter (Partial_CMD_Block) that defines a time window402 within which the memory controller is to cease the sending of atleast those commands that can aggravate a row hammer threat (read,write, activate). Here, the partial command block time window 402 startsat the leading/falling edge of the ALERT_N pulse and terminates within atime range that is specified in the memory chip's MR space. Within timewindow 402, the memory controller stops sending at least certain kindsof commands such as read, write and activate commands.

In various embodiments, both minimum and maximum times are specified inMR space 305 for the partial command block window 402. The memorycontroller is expected to stop sending at least certain commands fromthe moment the memory controller first samples the leading/falling edgeof the ALERT_N pulse until a period of time has elapsed 402 that fallswithin the partial block window's minimum and maximum settings.

Note that designing the memory controller to cease the sending ofcommands as soon it observes the leading/falling edge of the ALERT_Npulse does not distinguish between the other ALERT_N signals (DQ CRCwrite error and CA parity error). As such, the partial block time windowwill also cause cessation of commands if the ALERT_N signal is a DQ CRCwrite error or a CA parity error. The time window 402 can therefore beadvantageously used, for instance, to correct the DQ CRC write error orCA parity error. The memory chip is expected to execute all commandsthat it receives from the memory controller before the memory controllerceases its sending of commands in response to its sampling of theALERT_N falling edge.

In various configurations, the settings for the partial command blockwindow 402 are such that within the window 402, or shortly after thewindow 402 expires, the memory controller is able to determine whetherthe ALERT_N signal is a DQ CRC write error or a CA parity error. Forinstance, the partial block window 402 is configured to extend beyondthe minimum pulse width for a CA parity error (60 ns from the high tolow transition).

By so doing, while the memory controller is blocking the issuing ofcertain commands within the window 402, the memory controller mightactually observe a low to high transition that corresponds to a DQ CRCwrite error or a CA parity error. If so, the memory controller is freeto send any commands it desires to after the window 402 expires (theblock is lifted).

By contrast, if the pulse width of the ALERT_N signal extends beyond thewindow 402 and reaches the minimum pulse width 401 for am RHbackpressure signal, the memory controller recognizes that a RHbackpressure signal is at play and begins to send RFM commands 403 tothe memory chip. The sending of the RFM commands 403 gives the memorychip the authority to apply refreshes to mitigate its row hammer threat.

In various embodiments, the memory controller stops sending RFM commands403 once the memory chip terminates the pulse (low to high transition isobserved), or, if the pulse width reaches the maximum pulse 404 for theRH backpressure signal as specified in the memory chip's MR space(whichever comes first). Here, in various embodiments, the memory chip'sbackpressure signal circuitry 303 is designed to terminate the Alert_Npulse by way of a rising edge (which the RCD chip re-drives) in responseto a signal from the memory chip's row hammer threat detection circuitry302 that the row hammer threat that prompted the backpressure signal hasbeen mitigated with sufficient refreshes.

If instead the memory controller stops sending RFM commands because themaximum specified pulse width 404 has been reached, ideally, the memorychip does not need any more time to mitigate the row hammer threat. Assuch, in various embodiments, the difference between the minimum andmaximum pulse width settings 401, 404 is defined in view of the worstcase row hammer threat (e.g., row hammer overload) that is expected tobe successfully mitigated.

Refreshes performed in response to the RFM commands 403 sent by thememory controller in response to the backpressure signal are deemed tobe additional refreshes sent beyond the nominal refreshing activityaccomplished with REFRESH commands and any refreshing activityaccomplished with RFM commands that are triggered from the memorycontroller's tracking of bank activations (both types of refreshingactivities continue in addition to the refreshes performed in responseto the row hammer overload signal).

In various embodiments, once the memory chip detects a row hammer threatevent, the memory chip updates its MR space 305 to confirm the type oferror (e.g., row hammer overload) and/or additional informationconcerning the error (e.g., which specific rows have reached theirrespective threshold). The memory controller is then free to read the MRspace 305 to confirm and/or better understand the situation at thememory chip 301.

For example, if the MR space 305 indicates which rows are reaching theirthreshold, the memory controller RFM commands 403 that are send inresponse to the backpressure signal can be targeted RFM commands thatspecify a bank address where the rows' corresponding victim rows reside.In an alternative or combined approach, the MR space 305 indicates thenumber of RFM commands needed to mitigate the row hammer overloadcondition. In this case, the memory controller can cease sending RFMcommands 403 in response to the backpressure signal after the specifiednumber of RFM commands have been sent (rather than waiting for the pulseto terminate or the maximum pulse width 404 to be reached). In yetanother or combined embodiment, the MR space 305 indicates how much timethe memory chip needs to spend refreshing before the RH threat ismitigated. In this case, the memory controller will send RFM commands403 in response to the backpressure signal for the specified amount oftime and then cease the sending of the RFM commands 403.

Additionally, in the case of multi-rank memory modules, without readingMR register space, the memory controller may not know which of the ranksare suffering the row hammer threat (because the Alert_n pin of thememory chips from more than one rank are tied to the same RCD input). Inthis case, the memory controller can be designed to apply row hammermitigation (cease the sending of read/write/activate commands withinwindow 402 and begin the sending of RFM commands 403) to all of theranks. If the memory controller reads MR register space, it canunderstand which rank's memory chips are suffering a row hammer threatand apply row hammer mitigation only to memory chips of the afflictedrank.

Once the memory chip observes that the row hammer threat is mitigated,in various embodiments, the memory chip clears any of MR space 305 thatwas set to indicate the existence of the threat. Alternatively, thememory controller can clear the MR space (via a write to the memorychip's MR space 305) once the memory chip terminates the pulse or thememory controller has sent all permitted RFM commands 403 in response tothe assertion of the backpressure signal.

According to the embodiments described above, the memory chip waspresumed to issue refreshes (when not in self refresh mode) only inresponse to REFRESH and RFM commands sent by the memory controller. Inalternative or combined implementations, the memory chip 301 includescircuitry to schedule/issue refreshes that it initiates on its own. Insuch embodiments, the number of RFM commands 403 that are needed by thememory chip may be reduced (e.g., to zero) because the memory chip caninitiate enough refreshes internally to mitigate the problem so long asthe memory controller ceases sending new read, write or activatecommands (e.g., the memory can mitigate the threat internally withinwindow 402).

According to yet another approach, the memory controller automaticallyplaces the memory chip in self refresh mode in response to the memorychip asserting the backpressure signal (instead of sending RFM commands403).

According to yet another approach, a read burst includes a meta data bitthat is used to indicate whether a row hammer threat exists. Forexample, one bit is reserved for each memory chip per read transaction(e.g., 0=row hammer threat does not exist; 1=row hammer threat exists).If any meta data bit indicates that a row hammer threat exists, thememory controller issues a series of RFM commands spaced by a readcommand at the same address. The memory controller continues to operatein this state until the meta data indicates a row hammer threat does notexist.

Although embodiments described above have emphasized a single ALERT_Nwire per sub-channel and all memory chips of a same rank or samesub-channel being tied to the same alert wire on the memory module, itis pertinent to recognize that other embodiments that depart from thesespecific designs can nevertheless incorporate the teachings providedherein. For example, there may be multiple ALERT_N wires per sub-channel(e.g., one such wire per rank, etc.).

FIG. 5 shows a memory controller 501 that has been designed to handle arow hammer backpressure signal (or other kind of backpressure signal)consistent with the teachings provided above. In particular, the memorycontroller 501 includes an input to receive a signal from an ALERT_Nwire that communicates errors/problems from a memory module and/ormemory chip according to different pulse widths that are generated onthe ALERT_N wire. One of the signals is backpressure signal such as arow hammer backpressure signal. As such, the memory controller 501includes circuitry 502 that can differentiate amongst the differentsignals and detect a backpressure signal.

The memory controller 501 includes command scheduler circuitry 503 thatdetermines which commands are sent to the memory modules/chips that thememory controller 501 is coupled to. The command scheduler logiccircuitry 503 is designed to stop sending at least certain kinds ofcommands (e.g., read, write, activate) and/or send additional refreshcommands (e.g., RFM commands) in response to the detection by the memorycontroller to a backpressure signal that is received at the ALERT_Ninput.

In still yet other embodiments, rather than using an ALERT_N wire, abackpressure signal is sent via digital communication through some kindof packetized communication link between the memory module and thememory controller. For example, certain wires of the command/addressbus, or, the data bus (DQ bus) may be used to send packets back to thememory controller from a memory chip. Such packets may include a headerand/or payload that corresponds to a backpressure signal as describedabove. The packetized communication link may be formed with a singlewire or multiple wires depending on implementation. In the case of amultiple wire implementation, the terms “output” and “input” in relationto the sending and receiving of a backpressure signal can be construedto also include such multiple wires.

The teachings provided above can be applied to various memoryimplementations including JEDEC DDR5 implementations, JEDEC DDR6implementations, JEDEC graphics DDR (GDDR) implementations, JEDEC HighBandwidth Memory (HBM) implementations, etc.

The various types of circuitry described above can be implemented, atleast partially, with logic circuitry. Logic circuitry can include logicgates and/or larger logic macros formed with such logic gates that arededicated and hardwired, programmable or configurable logic circuitrysuch as field programmable gate array (FPGA) circuitry and/or circuitrydesign to execute some form of program code (e.g., micro-controller).

FIG. 6 depicts an example system. The system can use the teachingsprovided herein. System 600 includes processor 610, which providesprocessing, operation management, and execution of instructions forsystem 600. Processor 610 can include any type of microprocessor,central processing unit (CPU), graphics processing unit (GPU),processing core, or other processing hardware to provide processing forsystem 600, or a combination of processors. Processor 610 controls theoverall operation of system 600, and can be or include, one or moreprogrammable general-purpose or special-purpose microprocessors, digitalsignal processors (DSPs), programmable controllers, application specificintegrated circuits (ASICs), programmable logic devices (PLDs), or thelike, or a combination of such devices.

In one example, system 600 includes interface 612 coupled to processor610, which can represent a higher speed interface or a high throughputinterface for system components that needs higher bandwidth connections,such as memory subsystem 620 or graphics interface components 640, oraccelerators 642. Interface 612 represents an interface circuit, whichcan be a standalone component or integrated onto a processor die. Wherepresent, graphics interface 640 interfaces to graphics components forproviding a visual display to a user of system 600. In one example,graphics interface 640 can drive a high definition (HD) display thatprovides an output to a user. High definition can refer to a displayhaving a pixel density of approximately 100 PPI (pixels per inch) orgreater and can include formats such as full HD (e.g., 1080p), retinadisplays, 4K (ultra-high definition or UHD), or others. In one example,the display can include a touchscreen display. In one example, graphicsinterface 640 generates a display based on data stored in memory 630 orbased on operations executed by processor 610 or both. In one example,graphics interface 640 generates a display based on data stored inmemory 630 or based on operations executed by processor 610 or both.

Accelerators 642 can be a fixed function offload engine that can beaccessed or used by a processor 610. For example, an accelerator amongaccelerators 642 can provide compression (DC) capability, cryptographyservices such as public key encryption (PKE), cipher,hash/authentication capabilities, decryption, or other capabilities orservices. In some embodiments, in addition or alternatively, anaccelerator among accelerators 642 provides field select controllercapabilities as described herein. In some cases, accelerators 642 can beintegrated into a CPU socket (e.g., a connector to a motherboard orcircuit board that includes a CPU and provides an electrical interfacewith the CPU). For example, accelerators 642 can include a single ormulti-core processor, graphics processing unit, logical execution unitsingle or multi-level cache, functional units usable to independentlyexecute programs or threads, application specific integrated circuits(ASICs), neural network processors (NNPs), “X” processing units (XPUs),programmable control logic, and programmable processing elements such asfield programmable gate arrays (FPGAs). Accelerators 642 can providemultiple neural networks, processor cores, or graphics processing unitscan be made available for use by artificial intelligence (AI) or machinelearning (ML) models. For example, the AI model can use or include anyor a combination of: a reinforcement learning scheme, Q-learning scheme,deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C),combinatorial neural network, recurrent combinatorial neural network, orother AI or ML model. Multiple neural networks, processor cores, orgraphics processing units can be made available for use by AI or MLmodels.

Memory subsystem 620 represents the main memory of system 600 andprovides storage for code to be executed by processor 610, or datavalues to be used in executing a routine. Memory subsystem 620 caninclude one or more memory devices 630 such as read-only memory (ROM),flash memory, volatile memory, or a combination of such devices. Memory630 stores and hosts, among other things, operating system (OS) 632 toprovide a software platform for execution of instructions in system 600.Additionally, applications 634 can execute on the software platform ofOS 632 from memory 630. Applications 634 represent programs that havetheir own operational logic to perform execution of one or morefunctions. Processes 636 represent agents or routines that provideauxiliary functions to OS 632 or one or more applications 634 or acombination. OS 632, applications 634, and processes 636 providesoftware logic to provide functions for system 600. In one example,memory subsystem 620 includes memory controller 622, which is a memorycontroller to generate and issue commands to memory 630. It will beunderstood that memory controller 622 could be a physical part ofprocessor 610 or a physical part of interface 612. For example, memorycontroller 622 can be an integrated memory controller, integrated onto acircuit with processor 610. In some examples, a system on chip (SOC orSoC) combines into one SoC package one or more of: processors, graphics,memory, memory controller, and Input/Output (I/O) control logic.

A volatile memory is memory whose state (and therefore the data storedin it) is indeterminate if power is interrupted to the device. Dynamicvolatile memory requires refreshing the data stored in the device tomaintain state. One example of dynamic volatile memory includes DRAM(Dynamic Random Access Memory), or some variant such as Synchronous DRAM(SDRAM). A memory subsystem as described herein may be compatible with anumber of memory technologies, such as DDR3 (Double Data Rate version 3,original release by JEDEC (Joint Electronic Device Engineering Council)on Jun. 27, 2007). DDR4 (DDR version 4, initial specification publishedin September 2012 by JEDEC), DDR4E (DDR version 4), LPDDR3 (Low PowerDDR version3, JESD209-3B, August 2013 by JEDEC), LPDDR4) LPDDR version4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (WideInput/Output version 2, JESD229-2 originally published by JEDEC inAugust 2014, HBM (High Bandwidth Memory, JESD325, originally publishedby JEDEC in October 2013, LPDDR5 (currently in discussion by JEDEC),HBM2 (HBM version 2), currently in discussion by JEDEC, or others orcombinations of memory technologies, and technologies based onderivatives or extensions of such specifications. The JEDEC standardsare available at www.jedec.org.

The memory 630 could include one or more memory chips designed to send abackpressure signal, and, the memory controller 622 could respond to thebackpressure signal according to the teachings described at length abovewith respect to FIGS. 3 through 5.

While not specifically illustrated, it will be understood that system600 can include one or more buses or bus systems between devices, suchas a memory bus, a graphics bus, interface buses, or others. Buses orother signal lines can communicatively or electrically couple componentstogether, or both communicatively and electrically couple thecomponents. Buses can include physical communication lines,point-to-point connections, bridges, adapters, controllers, or othercircuitry or a combination. Buses can include, for example, one or moreof a system bus, a Peripheral Component Interconnect express (PCIe) bus,a HyperTransport or industry standard architecture (ISA) bus, a smallcomputer system interface (SCSI) bus, Remote Direct Memory Access(RDMA), Internet Small Computer Systems Interface (iSCSI), NVM express(NVMe), Coherent Accelerator Interface (CXL), Coherent AcceleratorProcessor Interface (CAPI), a universal serial bus (USB), or anInstitute of Electrical and Electronics Engineers (IEEE) standard 1394bus.

In one example, system 600 includes interface 614, which can be coupledto interface 612. In one example, interface 614 represents an interfacecircuit, which can include standalone components and integratedcircuitry. In one example, multiple user interface components orperipheral components, or both, couple to interface 614. Networkinterface 650 provides system 600 the ability to communicate with remotedevices (e.g., servers or other computing devices) over one or morenetworks. Network interface 650 can include an Ethernet adapter,wireless interconnection components, cellular network interconnectioncomponents, USB (universal serial bus), or other wired or wirelessstandards-based or proprietary interfaces. Network interface 650 cantransmit data to a remote device, which can include sending data storedin memory. Network interface 650 can receive data from a remote device,which can include storing received data into memory. Various embodimentscan be used in connection with network interface 650, processor 610, andmemory subsystem 620.

In one example, system 600 includes one or more input/output (I/O)interface(s) 660. I/O interface 660 can include one or more interfacecomponents through which a user interacts with system 600 (e.g., audio,alphanumeric, tactile/touch, or other interfacing). Peripheral interface670 can include any hardware interface not specifically mentioned above.Peripherals refer generally to devices that connect dependently tosystem 600. A dependent connection is one where system 600 provides thesoftware platform or hardware platform or both on which operationexecutes, and with which a user interacts.

In one example, system 600 includes storage subsystem 680 to store datain a nonvolatile manner. In one example, in certain systemimplementations, at least certain components of storage 680 can overlapwith components of memory subsystem 620. Storage subsystem 680 includesstorage device(s) 684, which can be or include any conventional mediumfor storing large amounts of data in a nonvolatile manner, such as oneor more magnetic, solid state, or optical based disks, or a combination.Storage 684 holds code or instructions and data 686 in a persistentstate (e.g., the value is retained despite interruption of power tosystem 600). Storage 684 can be generically considered to be a “memory,”although memory 630 is typically the executing or operating memory toprovide instructions to processor 610. Whereas storage 684 isnonvolatile, memory 630 can include volatile memory (e.g., the value orstate of the data is indeterminate if power is interrupted to system600). In one example, storage subsystem 680 includes controller 682 tointerface with storage 684. In one example controller 682 is a physicalpart of interface 614 or processor 610 or can include circuits or logicin both processor 610 and interface 614.

A non-volatile memory (NVM) device is a memory whose state isdeterminate even if power is interrupted to the device. In oneembodiment, the NVM device can comprise a block addressable memorydevice, such as NAND technologies, or more specifically, multi-thresholdlevel NAND flash memory (for example, Single-Level Cell (“SLC”),Multi-Level Cell (“MLC”), Quad-Level Cell (“QLC”), Tri-Level Cell(“TLC”), or some other NAND). A NVM device can also comprise abyte-addressable write-in-place three dimensional cross point memorydevice, or other byte addressable write-in-place NVM device (alsoreferred to as persistent memory), such as single or multi-level PhaseChange Memory (PCM) or phase change memory with a switch (PCMS), NVMdevices that use chalcogenide phase change material (for example,chalcogenide glass), resistive memory including metal oxide base, oxygenvacancy base and Conductive Bridge Random Access Memory (CB-RAM),nanowire memory, ferroelectric random access memory (FeRAM, FRAM),magneto resistive random access memory (MRAM) that incorporatesmemristor technology, spin transfer torque (STT)-MRAM, a spintronicmagnetic junction memory based device, a magnetic tunneling junction(MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer)based device, a thyristor based memory device, or a combination of anyof the above, or other memory.

A power source (not depicted) provides power to the components of system600. More specifically, power source typically interfaces to one ormultiple power supplies in system 700 to provide power to the componentsof system 600. In one example, the power supply includes an AC to DC(alternating current to direct current) adapter to plug into a walloutlet. Such AC power can be renewable energy (e.g., solar power) powersource. In one example, power source includes a DC power source, such asan external AC to DC converter. In one example, power source or powersupply includes wireless charging hardware to charge via proximity to acharging field. In one example, power source can include an internalbattery, alternating current supply, motion-based power supply, solarpower supply, or fuel cell source.

In an example, system 600 can be implemented as a disaggregatedcomputing system. For example, the system 700 can be implemented withinterconnected compute sleds of processors, memories, storages, networkinterfaces, and other components. High speed interconnects can be usedsuch as PCIe, Ethernet, or optical interconnects (or a combinationthereof). For example, the sleds can be designed according to anyspecifications promulgated by the Open Compute Project (OCP) or otherdisaggregated computing effort, which strives to modularize mainarchitectural computer components into rack-pluggable components (e.g.,a rack pluggable processing component, a rack pluggable memorycomponent, a rack pluggable storage component, a rack pluggableaccelerator component, etc.).

FIG. 7 depicts an example of a data center. Various embodiments can beused in or with the data center of FIG. 7. As shown in FIG. 7, datacenter 700 may include an optical fabric 712. Optical fabric 712 maygenerally include a combination of optical signaling media (such asoptical cabling) and optical switching infrastructure via which anyparticular sled in data center 700 can send signals to (and receivesignals from) the other sleds in data center 700. However, optical,wireless, and/or electrical signals can be transmitted using fabric 712.The signaling connectivity that optical fabric 712 provides to any givensled may include connectivity both to other sleds in a same rack andsleds in other racks. Data center 700 includes four racks 702A to 702Dand racks 702A to 702D house respective pairs of sleds 704A-1 and704A-2, 704B-1 and 704B-2, 704C-1 and 704C-2, and 704D-1 and 704D-2.Thus, in this example, data center 700 includes a total of eight sleds.Optical fabric 712 can provide sled signaling connectivity with one ormore of the seven other sleds. For example, via optical fabric 712, sled704A-1 in rack 702A may possess signaling connectivity with sled 704A-2in rack 702A, as well as the six other sleds 704B-1, 704B-2, 704C-1,704C-2, 704D-1, and 704D-2 that are distributed among the other racks702B, 702C, and 702D of data center 700. The embodiments are not limitedto this example. For example, fabric 712 can provide optical and/orelectrical signaling.

FIG. 8 depicts an environment 800 includes multiple computing racks 802,each including a Top of Rack (ToR) switch 804, a pod manager 806, and aplurality of pooled system drawers. Generally, the pooled system drawersmay include pooled compute drawers and pooled storage drawers to, e.g.,effect a disaggregated computing system. Optionally, the pooled systemdrawers may also include pooled memory drawers and pooled Input/Output(I/O) drawers. In the illustrated embodiment the pooled system drawersinclude an INTEL® XEON® pooled computer drawer 808, and INTEL® ATOM™pooled compute drawer 810, a pooled storage drawer 812, a pooled memorydrawer 814, and an pooled I/O drawer 816. Each of the pooled systemdrawers is connected to ToR switch 804 via a high-speed link 818, suchas a 40 Gigabit/second (Gb/s) or 100 Gb/s Ethernet link or an 100+ Gb/sSilicon Photonics (SiPh) optical link. In one embodiment high-speed link818 comprises an 800 Gb/s SiPh optical link.

Again, the drawers can be designed according to any specificationspromulgated by the Open Compute Project (OCP) or other disaggregatedcomputing effort, which strives to modularize main architecturalcomputer components into rack-pluggable components (e.g., a rackpluggable processing component, a rack pluggable memory component, arack pluggable storage component, a rack pluggable acceleratorcomponent, etc.).

Multiple of the computing racks 800 may be interconnected via their ToRswitches 804 (e.g., to a pod-level switch or data center switch), asillustrated by connections to a network 820. In some embodiments, groupsof computing racks 802 are managed as separate pods via pod manager(s)806. In one embodiment, a single pod manager is used to manage all ofthe racks in the pod. Alternatively, distributed pod managers may beused for pod management operations.

RSD environment 800 further includes a management interface 822 that isused to manage various aspects of the RSD environment. This includesmanaging rack configuration, with corresponding parameters stored asrack configuration data 824.

Embodiments herein may be implemented in various types of computing,smart phones, tablets, personal computers, and networking equipment,such as switches, routers, racks, and blade servers such as thoseemployed in a data center and/or server farm environment. The serversused in data centers and server farms comprise arrayed serverconfigurations such as rack-based servers or blade servers. Theseservers are interconnected in communication via various networkprovisions, such as partitioning sets of servers into Local AreaNetworks (LANs) with appropriate switching and routing facilitiesbetween the LANs to form a private Intranet. For example, cloud hostingfacilities may typically employ large data centers with a multitude ofservers. A blade comprises a separate computing platform that isconfigured to perform server-type functions, that is, a “server on acard.” Accordingly, each blade includes components common toconventional servers, including a main printed circuit board (mainboard) providing internal wiring (e.g., buses) for coupling appropriateintegrated circuits (ICs) and other components mounted to the board.

Various examples may be implemented using hardware elements, softwareelements, or a combination of both. In some examples, hardware elementsmay include devices, components, processors, microprocessors, circuits,circuit elements (e.g., transistors, resistors, capacitors, inductors,and so forth), integrated circuits, ASICs, PLDs, DSPs, FPGAs, memoryunits, logic gates, registers, semiconductor device, chips, microchips,chip sets, and so forth. In some examples, software elements may includesoftware components, programs, applications, computer programs,application programs, system programs, machine programs, operatingsystem software, middleware, firmware, software modules, routines,subroutines, functions, methods, procedures, software interfaces, APIs,instruction sets, computing code, computer code, code segments, computercode segments, words, values, symbols, or any combination thereof.Determining whether an example is implemented using hardware elementsand/or software elements may vary in accordance with any number offactors, such as desired computational rate, power levels, heattolerances, processing cycle budget, input data rates, output datarates, memory resources, data bus speeds and other design or performanceconstraints, as desired for a given implementation. It is noted thathardware, firmware and/or software elements may be collectively orindividually referred to herein as “module,” “logic,” “circuit,” or“circuitry.”

Some examples may be implemented using or as an article of manufactureor at least one computer-readable medium. A computer-readable medium mayinclude a non-transitory storage medium to store logic. In someexamples, the non-transitory storage medium may include one or moretypes of computer-readable storage media capable of storing electronicdata, including volatile memory or non-volatile memory, removable ornon-removable memory, erasable or non-erasable memory, writable orre-writable memory, and so forth. In some examples, the logic mayinclude various software elements, such as software components,programs, applications, computer programs, application programs, systemprograms, machine programs, operating system software, middleware,firmware, software modules, routines, subroutines, functions, methods,procedures, software interfaces, API, instruction sets, computing code,computer code, code segments, computer code segments, words, values,symbols, or any combination thereof.

According to some examples, a computer-readable medium may include anon-transitory storage medium to store or maintain instructions thatwhen executed by a machine, computing device or system, cause themachine, computing device or system to perform methods and/or operationsin accordance with the described examples. The instructions may includeany suitable type of code, such as source code, compiled code,interpreted code, executable code, static code, dynamic code, and thelike. The instructions may be implemented according to a predefinedcomputer language, manner or syntax, for instructing a machine,computing device or system to perform a certain function. Theinstructions may be implemented using any suitable high-level,low-level, object-oriented, visual, compiled and/or interpretedprogramming language.

One or more aspects of at least one example may be implemented byrepresentative instructions stored on at least one machine-readablemedium which represents various logic within the processor, which whenread by a machine, computing device or system causes the machine,computing device or system to fabricate logic to perform the techniquesdescribed herein. Such representations, known as “IP cores” may bestored on a tangible, machine readable medium and supplied to variouscustomers or manufacturing facilities to load into the fabricationmachines that actually make the logic or processor

The appearances of the phrase “one example” or “an example” are notnecessarily all referring to the same example or embodiment. Any aspectdescribed herein can be combined with any other aspect or similar aspectdescribed herein, regardless of whether the aspects are described withrespect to the same figure or element. Division, omission or inclusionof block functions depicted in the accompanying figures does not inferthat the hardware components, circuits, software and/or elements forimplementing these functions would necessarily be divided, omitted, orincluded in embodiments.

Some examples may be described using the expression “coupled” and“connected” along with their derivatives. These terms are notnecessarily intended as synonyms for each other. For example,descriptions using the terms “connected” and/or “coupled” may indicatethat two or more elements are in direct physical or electrical contactwith each other. The term “coupled,” however, may also mean that two ormore elements are not in direct contact with each other, but yet stillco-operate or interact with each other.

The terms “first,” “second,” and the like, herein do not denote anyorder, quantity, or importance, but rather are used to distinguish oneelement from another. The terms “a” and “an” herein do not denote alimitation of quantity, but rather denote the presence of at least oneof the referenced items. The term “asserted” used herein with referenceto a signal denote a state of the signal, in which the signal is active,and which can be achieved by applying any logic level either logic 0 orlogic Ito the signal. The terms “follow” or “after” can refer toimmediately following or following after some other event or events.Other sequences of steps may also be performed according to alternativeembodiments. Furthermore, additional steps may be added or removeddepending on the particular applications. Any combination of changes canbe used and one of ordinary skill in the art with the benefit of thisdisclosure would understand the many variations, modifications, andalternative embodiments thereof.

Disjunctive language such as the phrase “at least one of X, Y, or Z,”unless specifically stated otherwise, is otherwise understood within thecontext as used in general to present that an item, term, etc., may beeither X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z).Thus, such disjunctive language is not generally intended to, and shouldnot, imply that certain embodiments require at least one of X, at leastone of Y, or at least one of Z to each be present. Additionally,conjunctive language such as the phrase “at least one of X, Y, and Z,”unless specifically stated otherwise, should also be understood to meanX, Y, Z, or any combination thereof, including “X, Y, and/or Z.”

1. A memory chip, comprising: row hammer threat detection circuitry; anoutput; backpressure signal generation circuitry coupled between the rowhammer detection circuitry and the output, the backpressure signalgeneration signal to generate a backpressure signal to be sent from theoutput in response to detection by the row hammer threat detectioncircuitry of a row hammer threat.
 2. The memory chip of claim 1 whereinthe output is an alert_n output.
 3. The memory chip of claim 1 whereinthe row hammer threat comprises a plurality of rows of a memory cellarray within the memory chip concurrently reaching a row activationthreshold.
 4. The memory chip of claim 1 wherein the backpressure signalcomprises a pulse having a width within a certain pulse width minimumand pulse width maximum.
 5. The memory chip of claim 1 wherein thememory chip comprises mode register (MR) mode space to record the pulsewidth minimum.
 6. The memory chip of claim 1 wherein the memory chipcomprises MR space to provide information concerning the row hammerthreat.
 7. The memory chip of claim 1 wherein the information is todescribe a time window within which a memory controller is to ceasesending read, write and activation commands to the memory chip as aconsequence of the memory controller having received the backpressuresignal.
 8. The memory chip of claim 1 wherein the information is toaffect how many refresh commands a memory controller is to send thememory chip in response to the backpressure signal.
 9. The memory chipof claim 8 wherein the refresh commands are refresh management (RFM)commands.
 10. The memory chip of claim 1 wherein the backpressure signalis to be sent in a packet over the output.
 11. A memory controller,comprising: an input; backpressure signal detection circuitry coupled tothe input, the backpressure signal detection circuitry to detect abackpressure signal received at the input; and, command circuitrycoupled to the backpressure signal detection circuitry, the commandcircuitry to stop sending read, write and activation commands to amemory chip as a consequence of the memory controller having receivedthe backpressure signal, the memory chip having generated thebackpressure signal in response to a row hammer threat.
 12. The memorycontroller of claim 11 wherein the input is an alert_n output.
 13. Thememory controller of claim 11 wherein the backpressure signal comprisesa pulse having a width within a certain pulse width minimum and pulsewidth maximum.
 14. The memory controller of claim 11 wherein the memorycontroller is to read MR space within the memory chip to obtaininformation concerning a row hammer threat.
 15. The memory controller ofclaim 11 wherein the memory controller is to stop sending the read,write and activation commands for a predefined time period.
 16. Thememory controller of claim 11 wherein the memory controller is to sendrefresh commands to the memory chip in response to the backpressuresignal.
 17. The memory controller of claim 1 wherein the memorycontroller is also to receive a cyclic redundancy check write errorsignal and a command/address parity error signal at the input.
 18. Acomputing system, comprising: a) a plurality of processing cores; b) anetwork interface; c) a memory chip, the memory chip comprising: i) rowhammer threat detection circuitry; ii) an output; iii) backpressuresignal generation circuitry coupled between the row hammer detectioncircuitry and the output, the backpressure signal generation signal togenerate a backpressure signal to be sent from the output in response todetection by the row hammer threat detection circuitry of a row hammerthreat; d) a memory controller coupled to the memory chip, the memorycontroller comprising: i) an input; ii) backpressure signal detectioncircuitry coupled to the input, the backpressure signal detectioncircuitry to detect the backpressure signal received at the input; and,iii) command circuitry coupled to the backpressure signal detectioncircuitry, the command circuitry to stop sending read, write andactivation commands to the memory chip as a consequence of the memorycontroller having received the backpressure signal.
 19. The computingsystem of claim 18 wherein the output is an alert_n output.
 20. Thecomputing system of claim 18 wherein the row hammer threat comprises aplurality of rows of a memory cell array within the memory chipconcurrently reaching a row activation threshold.
 21. The computingsystem of claim 18 wherein the wherein the memory controller is to sendrefresh commands to the memory chip in response to the backpressuresignal.